Goto

Collaborating Authors

 Ashgabat




Mechanistic Interpretability with SAEs: Probing Religion, Violence, and Geography in Large Language Models

Simbeck, Katharina, Mahran, Mariam

arXiv.org Artificial Intelligence

Despite growing research on bias in large language models (LLMs), most work has focused on gender and race, with little attention to religious identity. This paper explores how religion is internally represented in LLMs and how it intersects with concepts of violence and geography. Using mechanistic interpretability and Sparse Autoencoders (SAEs) via the Neuronpedia API, we analyze latent feature activations across five models. We measure overlap between religion- and violence-related prompts and probe semantic patterns in activation contexts. While all five religions show comparable internal cohesion, Islam is more frequently linked to features associated with violent language. In contrast, geographic associations largely reflect real-world religious demographics, revealing how models embed both factual distributions and cultural stereotypes. These findings highlight the value of structural analysis in auditing not just outputs but also internal representations that shape model behavior.


The Rise of AI-Generated Content in Wikipedia

Brooks, Creston, Eggert, Samuel, Peskoff, Denis

arXiv.org Artificial Intelligence

The rise of AI-generated content in popular information sources raises significant concerns about accountability, accuracy, and bias amplification. Beyond directly impacting consumers, the widespread presence of this content poses questions for the long-term viability of training language models on vast internet sweeps. We use GPTZero, a proprietary AI detector, and Binoculars, an open-source alternative, to establish lower bounds on the presence of AI-generated content in recently created Wikipedia pages. Both detectors reveal a marked increase in AI-generated content in recent pages compared to those from before the release of GPT-3.5. With thresholds calibrated to achieve a 1% false positive rate on pre-GPT-3.5 articles, detectors flag over 5% of newly created English Wikipedia articles as AI-generated, with lower percentages for German, French, and Italian articles. Flagged Wikipedia articles are typically of lower quality and are often self-promotional or partial towards a specific viewpoint on controversial topics.


How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures

Wibisono, Kevin Christian, Wang, Yixin

arXiv.org Machine Learning

Large language models (LLMs) like transformers have impressive in-context learning (ICL) capabilities; they can generate predictions for new queries based on input-output sequences in prompts without parameter updates. While many theories have attempted to explain ICL, they often focus on structured training data similar to ICL tasks, such as regression. In practice, however, these models are trained in an unsupervised manner on unstructured text data, which bears little resemblance to ICL tasks. To this end, we investigate how ICL emerges from unsupervised training on unstructured data. The key observation is that ICL can arise simply by modeling co-occurrence information using classical language models like continuous bag of words (CBOW), which we theoretically prove and empirically validate. Furthermore, we establish the necessity of positional information and noise structure to generalize ICL to unseen data. Finally, we present instances where ICL fails and provide theoretical explanations; they suggest that the ICL ability of LLMs to identify certain tasks can be sensitive to the structure of the training data.


REBUS: A Robust Evaluation Benchmark of Understanding Symbols

Gritsevskiy, Andrew, Panickssery, Arjun, Kirtland, Aaron, Kauffman, Derik, Gundlach, Hans, Gritsevskaya, Irina, Cavanagh, Joe, Chiang, Jonathan, La Roux, Lydia, Hung, Michelle

arXiv.org Artificial Intelligence

We propose a new benchmark evaluating the performance of multimodal large language models on rebus puzzles. The dataset covers 333 original examples of image-based wordplay, cluing 13 categories such as movies, composers, major cities, and food. To achieve good performance on the benchmark of identifying the clued word or phrase, models must combine image recognition and string manipulation with hypothesis testing, multi-step reasoning, and an understanding of human cognition, making for a complex, multimodal evaluation of capabilities. We find that proprietary models such as GPT-4V and Gemini Pro significantly outperform all other tested models. However, even the best model has a final accuracy of just 24%, highlighting the need for substantial improvements in reasoning. Further, models rarely understand all parts of a puzzle, and are almost always incapable of retroactively explaining the correct answer. Our benchmark can therefore be used to identify major shortcomings in the knowledge and reasoning of multimodal large language models.


Cognitive Semantic Communication Systems Driven by Knowledge Graph: Principle, Implementation, and Performance Evaluation

Zhou, Fuhui, Li, Yihao, Xu, Ming, Yuan, Lu, Wu, Qihui, Hu, Rose Qingyang, Al-Dhahir, Naofal

arXiv.org Artificial Intelligence

Semantic communication is envisioned as a promising technique to break through the Shannon limit. However, semantic inference and semantic error correction have not been well studied. Moreover, error correction methods of existing semantic communication frameworks are inexplicable and inflexible, which limits the achievable performance. In this paper, to tackle this issue, a knowledge graph is exploited to develop semantic communication systems. Two cognitive semantic communication frameworks are proposed for the single-user and multiple-user communication scenarios. Moreover, a simple, general, and interpretable semantic alignment algorithm for semantic information detection is proposed. Furthermore, an effective semantic correction algorithm is proposed by mining the inference rule from the knowledge graph. Additionally, the pre-trained model is fine-tuned to recover semantic information. For the multi-user cognitive semantic communication system, a message recovery algorithm is proposed to distinguish messages of different users by matching the knowledge level between the source and the destination. Extensive simulation results conducted on a public dataset demonstrate that our proposed single-user and multi-user cognitive semantic communication systems are superior to benchmark communication systems in terms of the data compression rate and communication reliability. Finally, we present realistic single-user and multi-user cognitive semantic communication systems results by building a software-defined radio prototype system.


Soft-labeling Strategies for Rapid Sub-Typing

Rosario, Grant, Noever, David, Ciolino, Matt

arXiv.org Artificial Intelligence

The challenge of labeling large example datasets for computer vision continues to limit the availability and scope of image repositories. This research provides a new method for automated data collection, curation, labeling, and iterative training with minimal human intervention for the case of overhead satellite imagery and object detection. The new operational scale effectively scanned an entire city (68 square miles) in grid search and yielded a prediction of car color from space observations. A partially trained yolov5 model served as an initial inference seed to output further, more refined model predictions in iterative cycles. Soft labeling here refers to accepting label noise as a potentially valuable augmentation to reduce overfitting and enhance generalized predictions to previously unseen test data. The approach takes advantage of a real-world instance where a cropped image of a car can automatically receive sub-type information as white or colorful from pixel values alone, thus completing an end-to-end pipeline without overdependence on human labor.


Digital Journal: A Global Digital Media Network

AITopics Original Links

Oroville - For the first time since it was completed in 1968, California's rain-swollen Oroville Dam overtopped its emergency spillway on Saturday, sending sheets of water down a forested hillside, adding mud and debris to the churning Feather River below. Turmoil grows over White House correspondents' dinner Washington - It is supposed to be a light-hearted gathering of journalists, celebrities and the president, where differences are put aside for good-natured jibes. Review: Tenor Mario Frangoulis charms on'Send Me An Angel' Special International tenor Mario Frangoulis delivers on his live performance of "Send Me An Angel," which was filmed in Greece. Germany to elect'anti-Trump' Steinmeier as new president Berlin - Billed as Germany's "anti-Trump", centre-left former foreign minister Frank-Walter Steinmeier is set to be elected Sunday as the new ceremonial head of state. Konjic - Bosnia's fisheries watchdog gazes over an expanse of sand and mud, a space once occupied by a large thriving lake but recently emptied in the race for electricity production.


Japan's professional video game sector advances to next level

The Japan Times

The Japan eSports Association, which promotes the competitive playing of video games, is nudging the sector toward professional status. JeSPA uses the term e-sport to refer to video games ranging from shootout arcade games to team-based tournaments set on a virtual pitch. It is hard to put a figure on the number of enthusiasts worldwide, but around 100 million are thought to play regularly and seriously. The tournament wound up with finals in five games on March 12 and 13 in the Tokyo neighborhood of Toyosu. The finale was a round of the fighting game "Guilty Gear Xrd -Sign-," which attracted 350 players and roughly 1,000 spectators, while more than 10,000 people followed it on Dwango's Nico Nico Live website. The winner -- who goes by the name of Dogura -- told The Japan Times he considers video games to be "sport performed with your brain."